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,-H Abstract 

o 

pq The skyline of a set of points in the plane is the subset of maximal points, where a point {x, y) is 

5_^ maximal if no other point {x' ,y') satisfies x' > x and y' > x. We consider the problem of preprocessing 

Qh ^ sst P of n points into a space efficient static data structure supporting orthogonal skyline counting 

•^ queries, i.e. given a query rectangle R to report the size of the skyline of P n i?. We present a data 

^i^ structure for storing n points with integer coordinates having query time 0(lgn/lglgn) and space 

-y~, usage 0{n). The model of computation is a unit cost RAM with logarithmic word size. We prove that 

these bounds are the best possible by presenting a lower bound in the cell probe model with logarithmic 

word size: Space usage n Ig ^ ' n implies worst case query time n{\g n/ Ig Ig n). 



Q 

(/5 1 Introduction 

In this paper we consider orthogonal range skyline queries for a set of points in the plane. A point (x, y) E 
^ M^ dominates a point (x', y') if and only if x' < x and y' < y. For a set of points P, a point p G P is 

Qv maximal if no other point in P dominates p, and the skyline of P, Skyline(P), is the subset of maximal 

lO points in P. 

We consider the problem of preprocessing a set P of n points in the plane with integer coordinates into a 
data structure to support orthogonal range skyline counting queries: Given an axis-aligned query rectangle 
R = [xi, X2] X [yi, 2/2] to report the size of the skyline of the subset of the points from P contained in R, 
CO i-C- report |Skyline(P n P)|. The main results of this paper are matching upper and lower bounds for data 

7^ structures supporting such queries, thus completely settling the problem. Our model of computation is the 

K* standard unit cost RAM with logarithmic word size. 
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1.1 Previous Work 

Orthogonal range searching is one of the most fundamental and well-studied topics in computational geom- 
etry, see e.g. f4l for an extensive list of previous results. For orthogonal range queries in the plane, with 
integer coordinates in [n] x [n] = {0, . . . , n — 1} x {0, . . . , n — 1}, the main results are the following: For 
the orthogonal range counting problem, i.e. queries report the total number of input points inside a query 
rectangle, optimal 0(lg n/ Ig Ig n) query time using 0{n) space was achieved in |I8|. Optimality was shown 
in |[T3]| . where it was proved that space nig '-^^ n implies query time J7(lgre/lglgn) for range counting 
queries. 
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Space Query time Reference 

Ig Ig n Ig Ig " 

nlgn Ig n |[9l 

Ig ^ Ig '^ ||Z1| 

„ ig" New 

Table 1 : Previous and new results for skyline counting queries. 

Space Query time Reference 

nigra Ig^ n + A; |[3l (dynamic) 

nigra Igra + fc llT0ll9ll 

Iglgn lglg« ^"^ 

ralg'^ra (A; + l)lglgra lITTI 

ralg^ra lgT|^ + ^ New 

ralglgra (A; + l)(lglgra)2 ||Tll 
nlglgn T^ + A;lglgn New 



n 



Iglgn 

(A; + 1) lg= n im 



Table 2: Results for skyline reporting queries. 



For range reporting queries it is known that space n Ig'^(^) n implies query time r2(lg Ig n+k), where k is 
the number of points reported within the query range lfT4l . The best upper bounds known for range reporting 
are: Optimal space 0{n) and query time 0{{k + 1) Ig^ ra) H, and optimal query time 0(lg \gn + k) with 
space 0{n Ig^ n) ID. In both cases e > is an arbitrarily small constant. 

Orthogonal Range Skyline Queries. Orthogonal range skyline counting queries were first consider in IJ. 
Here a data structure with space usage 0{n Ig^ n/ Ig Ig n) and query time 0(lg'^ " ra/ Ig Ig ra) was presented. 
This was subsequently improved to O (nigra) space and O(lgra) query time 0. Finally, a data structure 
achieving an even faster query time of 0(lgra/lglgra) was presented, however the space usage of that 
solution was a prohibitive 0(ralg^ ra/ Iglgn) |'6'|. Thus to date, no linear space solution exists with a non- 
trivial query time. Also, from a lower bound perspective, it is not known whether the problem is easier or 
harder than the standard range counting problem. 

For orthogonal skyline reporting queries, the best bound is 0(n Igra/lglgra) space with query time 
0(lg ra/ Ig Ig ra + fe) ||5l, where k is the size of the reported skyline. Note that an r2(lg Ig n) search term is 
needed for skyline range reporting since the i7(lg Ig n) lower bound for standard range reporting was proved 
even for the case of determining whether the query rectangle is empty |[T4l . 

In Ifm solutions for the sorted range reporting problem were presented, i.e. the problem of reporting 
the k leftmost points within a query rectangle in sorted order of increasing x-coordinate. With space 0{n), 
O(nlglgn) and O(nlg'^n), respectively, query times 0{{k + l)lg'^n), 0{{k + l)(lglgn)^), and 0{k + 
Ig Ig ra) were achieved, respectively. The structures of [1 1] support finding the rightmost (skyline) point in a 
query range {k = 1). By recursing on the rectangle above the reported point one immediately get the bounds 
for skyline reporting listed in Table |2] where only the linear space solution achieves query times matching 
those of general orthogonal range reporting. 

Previous results for skyline queries are summarized in Tables [T] and |2] 



1.2 Our Results 

In Section[2]we present a linear space data structure supporting orthogonal range skyline counting queries in 
0(lgn/lglgn) time, thus for the first time achieving linear space. Also, we simultaneously improve over 
all the best previous tradeoffs. Furthermore, in Section[3]we show that this is the best possible by proving a 
matching lower bound. More specifically, we prove a lower bound stating that the query time t must satisfy 
t = il(lg n/ lg{Sw/n)). Here 5 > n is the space usage in number of words and w = r2(lg n) is the word 
size in bits. For w = 0(lgn) and S = nig '^-^ n, this bound becomes t = r2(lgn/lglgn). The lower 
bound is proved in the cell probe model of Yao [18], which is more powerful than the unit cost RAM and 
hence the lower bound also applies to RAM data structures. 

As a side result, we in Section [4] also show how to modify our counting data structure to support report- 
ing queries. Our reporting data structure has query time 0(lgn/ Iglgn + k) and space usage 0(n Ig^ n). 
The best previous reporting structure with a linear term in k has 0(lgn/ Iglgn + A;) query time but 
0(nlgn/ Iglgn) space [5]. The reporting structure can also be modified to achieve 0(lgn/ Iglgn + 
k Ig Ig n) query time and 0{n Ig Ig n) space. See Table|2]for a comparison to previous results. 

Our upper bounds are achieved by constructing a balanced search tree of degree 0(lg^ n) over the points 
sorted by j;-coordinate. At each internal node we store several rank-select data structures storing the points 
in the subtrees sorted by rank-reduced y-coordinates. Using a constant number of global tables, queries 
only spend 0(1) time at each level of the tree. Our lower bound follows from a reduction of reachability 
in butterfly graphs to two-sided skyline counting queries, extending reductions by Patra§cu llT2l for two- 
dimensional rectangle stabbing and range counting queries. 

1.3 Preliminaries 

Coordinates. If the coordinates of the input and query points are not restricted to [n] x [n] , but can be 
arbitrary integers that fit into a machine word, then we can map the coordinates to the range [n] by using the 
RAM dictionary from ^, which can support predecessor queries on the lexicographical orderings of the 
points in time 0{y^lgn/ Iglgn) using 0{n) space. This is less than the 0(lg n/ Ig Ig n) query time we are 
aiming for. 

Succinct Data Structures. In our solutions, we make extensive use of the following results from succinct 
data structures. 

Lemma 1 ( 01511 ) A vector X[l..s] of s zero-one values, with t values equal to one, can be stored in a data 
structure of size 0{t{\ + \gs/t)) bits supporting rank and select queries in 0(1) time. A rank(i) query 
returns the number of ones in X[l..i], provided X[i] = 1, whereas a select(i) query returns the position of 
the i 'th one in X. 

Lemma 2 ( B16II ) Let X[l..s\ be a vector of s non-negative integers with total sum t. There exists a data 
structure of size 0{s lg(2 + 1/ s)) bits, supporting the lookup of X[i] and the prefix sum ^^1^=1 -^[j] ^^ 0{1) 
time, for i = 1 , . . . , s. 

Lemma 3 ( 017^ 171) Let X[l..s\ be a vector of integers. There exists a data structure of size 0{s) bits 
supporting range-maximum-queries in 0(1) time, i.e. given i and j, 1 < i < j < s, reports the index k, 
i ^ k < j, such that X[k] = inax{X[i..j]). Queries only access this data structure, i.e. the vector X is not 
stored. 



2 Data Structure 

In this section we describe a data structure using 0{n) space supporting orthogonal skyline counting queries 
in 0(lg n/ Ig Ig n) time. 

We let A = max{2, [Ig^n]} be a parameter of our construction, where < e < 1/3 is a constant. 
We build a balanced base tree T over the set of points P, where the leafs from left-to-right store the points 
in P in sorted order w.r.t. x-coordinate. Each internal node of T has degree at most A and T has height 
[Ig^ n] + 1. See Figure [T] 
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Figure 1 : The base tree T with A = 4, and the decomposition of a query into a sequence of multislab 
queries Ri-R^. White points are nodes on the skyline within R. The double circled points are the topmost 
points within each of the multislabs. 

For each internal node v of T we store a set of data structures. Before describing these we need to 
introduce some notation. The subtree of T rooted at a node v is denoted Ty , and the set of points stored at the 
leaves of Ty is denoted Py . We let n^ = \Py\ and L^, [1 . .n„] be the list of the points in Py sorted in increasing 
y-order. We let ly = [iy , Vy] denote the x-interval defined by the x-coordinates of the points stored at the 
leaves of Ty, and denote /„ x [n] the slab spanned by v. The degree of v is denoted dy, the children of v 
are from left-to-right denoted c^, . . . c^", and the parent of node v is denoted py. A list Ly is partitioned 
into a sequence of blocks i3t,[l..[n^/A^]] of size A^, such that By[i] = L„[(i — 1)A^ + 1.. min{n„,iA^}]. 
The signature (Jy[i] of a block By[i] is a list of pairs: For each point p in By[i] we construct a pair {j, r), 
where j is the index of the child d of v storing p and r is the rank of p's x-coordinate among all points 
in By [i] stored at the same child cj as p. The total number of bits required for a signature is at most 
A2(lgA + lgA2) = 0(lg2=n-lglgn). 

To achieve overall 0{n) space we need to encode succinctly sufficient information for performing 
queries. In particular we will not store the points in Ly explicitly at the node v, but only partial information 
about the points relative position will be stored. 

Queries on a block By [i] are handled using table lookups in global tables using the block signature ay[i]. 



We have tables for the below block queries, where we assume a is the signature of a block storing points 
pi, ■ ■ . ,Pa2 distributed in A child slabs. 

Below(cj, t, i) Returns the number of points from pi, . . . ,pt contained in slab i. 

Rightmost ((T, 6, t,i,j) Returns k, where p^ is the rightmost point among pf,, ■ ■ ■ ,pt contained in slabs [i,j]. 
If no such point exists, -1 is returned. 

Topmost((T, b, t, i, j) Returns k, where pk is the topmost point among p^, . . . ,pt contained in slabs [i, j]. If 
no such point exists, -1 is returned. 

SkyCount((T, 6, t, i,j) Returns the size of the skyline for the subset of the points p^, . . . ,pt contained in 
slabs [^, j]. 

Thearguments to each of the above lookups consists of at most |o"| +2 IgA^ +2 Ig A = |(j|+0(lglgn) = 
0(lg^^n • Iglgn) bits and the answer is lg(A + 1) = O(lglgn) bits, i.e. each query can be answered in 
0(1) time using a table of size 0(2's "igig" . Iglgn) = o{n) bits, since e < 1/3. 

For each internal node u of T we store the following data structures, each having 0(1) access time. 

Cv{i) Compact array that for each i, where I < i < riy, stores the index of the child of v storing L^ [i], i.e. 
1 < Cv{i) < A. Space usage 0{ny Ig A) bits. 

7r^(i) For each i, I < i < n„, stores the index of Ly[i] in Lp^, i.e. ip„[vr^(i)] = L^[i]. This can be 
supported by constructing the select data structure of Lemma [lion the bit- vector X, where X[i] = 1 
if and only if Lp^[i] is in L^. A query to 7r„(i) simply becomes a select(i) query. Space usage 

0{nylg{npjnv)) = 0(n^ Ig A) bits. 

av(i) Array of signatures for the blocks .B^[l..[nj,/A^]]. Space usage 0(nt,/A^ • A^ -Ig A) = 0{nylgA) 
bits. 

Pred„(i, i) I Succ^(i, i) Supports finding the predecessor/successor of L^[t\ in the z'th child list L^i . Re- 
turns max{A: \ 1 < k < n^i f\iT^i [k\ < t} and vai\i{k | 1 < A; < n^i A vr^i [k\ > t}, respectively. 
For each child index i, we construct an array X^ of size [n/A^], such that X^[b] is the number of 
points in block By[b] that are stored in the i'th child slab. The prefix sums of each X^ are stored 
using the data structure of Lemma pausing space 0((ni,/A^) Ig(A^)) bits. The total space for all 
A children of v becomes 0(A • riy/A'^ • Ig A) = 0{ny) bits. The result of a Pvedy{t,i) query is 

Ylj=i X*[j] + Below((j^([t/A^]), l + (t — 1 mod A^),z), where the first term can be computed 

in 0(1) time by Lemma[2]and the second term is a constant time global table lookup. The result of 

Succy{t,i) = Pred„(t,i) if C^,[t] = i, otherwise Succv{t,i) = Pred„(t,'i) + 1. 

Rightmostj,(i, j) Returns the index k, where i < k < j, such that I/^[fc] has the maximum x-value among 
Ly[i..j]. Using Lemma |3] on the array of the x-coordinates of the points in Ly we achieve 0(1) time 
queries and space usage 0(n^,) bits. 

SkyCountj,(i) Returns |Skyline(Lj,[l..i])|. Construct an array X, where X[i] is the number of points in 
Skyline(L„[l..i — 1]) dominated by Li,[i]. See Figurell] We can now compute |Skyline(Lt,[l..i])| as 
i — Yl]=i -^[j]- Using Lemma 2 the query time becomes 0(1) and the space usage 0{ny) bits, since 
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Figure 2: Computation of | Skyline(L^ [1 • -^l ) I ■ To the right of each point Ly [i] is shown the number of points 
in Skyline(L^[l.i — 1]) dominated by Ly[i]. The skyline of L^[1..6] consists of the three white nodes. 
|Skyline(L^[1..6])| =6-2-0-0-0-1-0 = 3. 
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Figure 3: Illustration of SkyCount„(z, j). White and black circles and crosses are all points. Ly[k] is the 
rightmost point in Ly[i..j]. Crosses indicate Skyline(L„[i..j]), white circles indicate Skyline(L„[l..A:]), and 
white circles together with crosses is Skyline(L^[l..j]). 

SkyCountj,(«, j) Returns |Skyline(L„[i..j])|, computable by the following expression (see Figure [5]): 

SkyCount^(j) — SkyCount^(Rightmost^(i, j)) + 1 . 

Finally, we store for each node v and slab interval [i, j] the following data structures. 

Rightmost^, j (6, t) Returns k, where L,u[k] is the rightmost point among the points in blocks B.u[b..t] 
contained in slabs [i, j]. If no such point exists, -1 is returned. Can be solved by applying Lemmajslto 
the array X, where X[s] is the x-coordinate of the rightmost point in By[s] contained in slabs [i, j]. A 
query first finds the block i containing the rightmost point using this data structure, and then returns 

{£- 1)A2 + Rightmost (cri,[£],l,A2,i,j). Space usage 0(n„/A2) bits. 

Topmost^ j (6, t) Returns k, where Ly[k] is the topmost point among the points in blocks By[b..t] con- 
tained in slabs [i , j] . If no such point exists, - 1 is returned. Can be solved by first using Lemmalslon the 
array X, where X[s] = s if there exists a point in By [s] contained in slabs [i , j] . Otherwise X [s] = 0. 
Let £ be the block found using Lemmap] Return the result of (£— 1)A^ + Topmost (cr„[^] , 1, A^, i, j). 
Space usage 0(n^/A^) bits. 

SkyCount^ j , (5, t) Returns the size of the skyline for the subset of points in blocks By[b..t] contained in 
slabs [i,j]. Can be supported by two applications of Lemma [2] on two arrays X and Y as follows. 
Let X[s] = SkyCount(o"i,[s], 1, A^, i, j), i.e. the size of the skyline of the points in block By[s] 
contained in slabs [i, j]. Let By^ij[s] denote the points in By[s] contained in slabs [i,j]. Let Y[s] = 
\Sk'ylme{By^ij[l..s — l])\Skylme{By^ij[l..s])\, i.e. the number of points on Skylme{By^ij[l..s — 1]) 
dominated by points in By^ijls]. Space usage for X and Y is 0{ny/A'^ • Ig A^) bits. We can compute 
SkyCount,_,_^.(6,t) = E^fc^W " Es=fc+i n^l where k = [Rightmost,,,,, (6, t)/A2]. 



The total space of our data structure, in addition to the o{n) bits for our global tables, can be bounded 
as follows. The total space for all O(A^) multislab data structures for a node v is 0(A^ • riv/A'^ • Ig A) 
bits. The total space for all data structures at a node v becomes 0{n.v Ig A) bits. Since the sum of all Uy 
for a level of T is at most n, the total space for all nodes at a level of T is 0(n Ig A) bits. Since T has 
height 0(lg^ n), the total space usage becomes 0{n Ig A • Ig^ n) = 0{n Ig n) bits, i.e. 0{n) words. 

2.1 Query 

To answer a skyline counting query R = [xi, X2] x [yi, ^2]. we identify the nodes on the paths in T from 
the two leaves storing xi and X2 up to the lowest common ancestor of the two leaves. Let ui , . . . , 1;^ be the 
set of these nodes in a right-to-left traversal in T (see Figure [III. The horizontal span of the query, [xi, X2], 
is the concatenation of the span of at most one multislab Ii, . . . ,Im from each oivi, . . . ,Vm- For each such 
multislab Ii we form a new subquery Ri = liX [ze, 2/2]. completely spanning the multislab in the horizontal 
direction and vertically has a range [z£, 2/2], where zi = yi and Z£ = max{z^_i, y™^^ + 1}, for i = 2..m 
and yY^^ is the maximal y-coordinate of a point in I^_i x [l,y2]. By definition of the i?£ queries, the 
skyline of the points contained within R is exactly the union of the skylines for each of the Ri subqueries 
(see Figure[T]l, since the points in R^ cannot be dominated by other points that are both in R and to the right 

0f/£. 

To navigate in T we need to find the index of the successor of yi and the predecessor y2 in each of the 
Ly^ lists. We start with yi and y2 being the indexes at the root, and then use the SucCt,/Predt, structures at 
the nodes to find the successor of yi and predecessor of y2 at all the nodes on the two paths from the root 
to xi and X2- To find the topmost point below y2 in a multislab we use Topmost^ j . To navigate y"^^^ 
values up and down between the levels of T we use vr^, (y™^^ ) to move upwards and Succ^, (y ™^^ , J ) to move 
downwards to a slab j. These navigations can be performed in 0(1) time per node on the paths, i.e. total 
time O(lg^n). 

What remains is to compute in 0(1) time the size the skyline within a query range R^. In the following 
we consider a query range that horizontally spans the child slabs [i, j] of a node v, and vertically spans the 
indexes [ybottom, ytop] in L^. 

If the query range is within a single block of L^ (i.e. [ybottom/A^] = [ytop/A^]), we compute the 
skyline size as 



SkyCount(o-„([ytop/A^]), 1 + (ybottom - 1 mod A^), 1 + (ytop - 1 mod A^) 



^,J} 



Otherwise we decompose the skyline counting query into five subranges (l)-(5), see Figure[4] We first 
compute the y-coordinate of the rightmost point pi in the top block Stop of the query range using 

pi.y = Rightmost(a4rytop/A2]), 1, 1 + (y^p - 1 mod /S^),i,j) + A^fy^p/A^ - 1] , 

and compute the size of the skyline of the intersection of i?top and the query region by: 

SkyCount(a„(rytop/A2]),l,l + (y^p - 1 mod A^),hj) . (1) 

Let ki be the slab containing pi, computable as ki = Cv{pi-y)- If no point is found in block i^top. then 
ki = i — 1. 

Next we compute the y-coordinate of the topmost point p2 in the multislab query range spanning slabs 
[ki + 1, j] and all blocks between i?bottom and i^top- 

P2.y = Topmost„^fc^+ij([ybottom/A^] + 1, [ytop/A^] - 1) . 



Bt. 
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?/bottom 



Figure 4: Skyline queries for multislabs. 



In the same subrange we find the y-coordinate of the rightmost point p^ using 

P3-y = Rightmost^^fc^+i .,([ybottom/A^] + 1, [ytop/A^] - 1) . 
Finally, the number of points on the skyline between p2 and p^ (including p2 and p^) is computed by 

SkyCount^^fc^+i_j.([ybottom/A2] + 1, [ytop/A^] - 1) . (2) 

The slab containing the point p^ is k^ = Cv{p^.y). We compute the number of points on the skyline to 
the right of ps in block -Bbottom by 

SkyCount(o-„([ybottom/A^]), 1 + (^bottom - 1 mod A^), A^, ^3 + 1, j) , (3) 

and the y-coordinate of the topmost point p^ in block -Bbottom contained in slabs [/cs + 1 , j] by 

P4.y = Topmost(o-^([ybottom/A^l),l + (^bottom - 1 mod A^),A^,fc3 + l,j) + A^ [ybottom/A^ - 1] . 

The remaining points to be counted are the skyline points in slab ki between pi and p2, and in slab k-^ 
between ps and the point p4 in block -Bbottom- These values can be computed by 



SkyCount ki{SucCy{p2-y + 1, ki),Fredv{pi.y, ki)) — 1 

Cy 

SkyCount k3{SucCy{pi.y, k3),Predy{p3.y, ks)) - 1 , 



(4) 
(5) 



where we subtract one in both expressions, to avoid double counting pi and p^. 

Figure |4] illustrates the five partial counts computed. In the above we assumed that all queries ranges 
were non-empty. In case pi does not exist, then ki = i — 1 and (4) is not computed. If p^ does not exist, then 
(5) stretches down to ^bottom- If P2 and p^ do not exist (p2 and p-^ are the same point if (4) only contains 
one maximal point), then (2) and (5) are not computed, the leftmost slab of (3) is ki + 1, and (4) stretches 
down top4.y + 1. 

To summarize, it follows that the skyline size for each multislab query Ri can be computed in 0(1) 
time, and the total time for a skyline counting query becomes 0(lg n/ Ig Ig n). 



3 Lower Bound 

That an orthogonal range skyUne counting data structure requires space ^{n Ig n) bits, follows immediately 
since each of the n! different input point sets of size n from [n]^, where points have distinct x- and y- 
coordinates, can be reconstructed using query rectangles considering for each possible point in [n]^, i.e. the 
space usage is at least [lg2(n!)] = O(nlgn) bits. 

In the remainder of the section, we prove that any data structure using S > n words of space must 
have query time t = Q{lgn/ lg{S'w/n)), where w = O(lgn) denotes the word size in bits. In particular 
for w = Ig*^^^) n, this implies that any data structure using nlg*^*^^^ n space must have query time t = 
r2(lg n/ Ig Ig n), showing that our data structure from Section|2]is optimal. Our lower bound holds even for 
data structures only supporting skyline counting queries inside 2-sided rectangles, i.e. query rectangles of 
the form (—00, x] x (—00, y]. The lower bound is proved in the cell probe model of Yao lITSll with word size 
w = ri(lg n). Since we derive our lower bound by reduction, we will not spend time on introducing the cell 
probe model, but merely note that lower bounds proved in this model applies to data structures developed in 
the unit cost RAM model. See e.g. 1.13.1 for a brief description of the cell probe model. 

Reachability in the Butterfly Graph. We prove our lower bound by reduction from the problem known 
as reachability oracles in the butterfly graph II12L A butterfly graph of degree B and depth d is a directed 
graph with d + \ layers, each having B'^ nodes ordered from left to right (see Figure [s]). The nodes at level 
are the sources and the nodes at level d are the sinks. Each node, except the sinks, has out-degree B, and 
each node, except the sources, has in-degree B. 

If we number the nodes at each level with 0, . . . , B'^ — 1 from left to right and interpret each index 
i G [B'^] as a vector v{i) = v{i)[d — \]- ■ ■ v{i) [0] G [B^ (just write i in base B), then the node at index i at 
layer k G [d] has an out-going edge to each node j at layer A; + 1 for which v{j) and v{i) differ only in the 
fe'th coordinate. Here the O'th coordinate is the coordinate corresponding to the least significant digit when 
thinking of v{i) and v{j) as numbers written in base B (again see FigureBl). Observe that there is precisely 
one directed path between each source-sink pair For the s'th source and the t'th sink, this path corresponds 
to "morphing" one digit of v{s) into the corresponding digit in v{t) for each layer traversed in the butterfly 
graph. 

The input to the problem of reachability oracles in the butterfly graph, with degree B and depth d, is a 
subset of the edges of the butterfly graph, i.e. we are given a subgraph G of the butterfly as input. A query is 
specified by a source-sink pair and the goal is to return whether there exists a directed path from the given 
source to the given sink in G. 

Patra§cu proved the following lower bound for reachability oracles: 

Theorem 1 (Patra§cu |12|, Section 5) Any cell probe data structure answering reachability queries in sub- 
graphs of the butterfly graph with degree B and depth d, having space usage S words of w bits, must have 
query time t = Q,{d), provided B = Q{w'^) and IgB = Q{lg Sd/N). Here N denotes the number of 
non-sink nodes in the butterfly graph. 

We derive our lower bound by showing that any cell probe data structure for skyline range counting can 
be used to answer reachability queries in subgraphs of the butterfly graph for any degree B and depth d. 

Edges to 2-d Rectangles. Consider the butterfly graph with degree B and depth d. The first step of our 
reduction is inspired by the reduction Patra§cu used to obtain a lower bound for 2-d rectangle stabbing: 
Consider an edge of the butterfly graph, leaving the z'th node at layer k G [d] and entering the j 'th node in 
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Figure 5: A butterfly with degree B = 2 and depth d = 3. The path shown in bold is the unique path 
from the source s = 001 to the sink t = 110. A concrete input to the reachability oracles in the butterfly 
graph problem consists of a subset of the edges of the butterfly. An example input is obtained by deleting 
the dashed edges labelled a, b and c. For that input, there is no path from the source s to the sink t since the 
edge h is not part of the input. 

layer k + \. We denote this edge efc(i, j). The source-sink pairs (s, t) that are connected through ek{i,j) are 
those for which: 



1. The source has an index s satisfying v{s) [h] 
significant digits when written in base B. 



vlt 



for h> k, i.e. s and i agree on the d — k most 



2. The sink has an index t satisfying v{t) [h] = v{j) [h] for /i < /c + 1, i.e. t and j agree on the k + 1 least 
significant digits when written in base B. 

We now map each edge ek{i,j) of the butterfly graph to a rectangle in 2-d. For the edge ek{i,j), we create 
the rectangle rfc(i,j) = [xi,2;2] x [yi,y2\ where: 

• xi = v{i) [d — l]v{i)[d — 2] ■ ■ ■ v{i)[k]0 ■ ■ ■ when written in base B, 

• X2 = v{i)[d — l]v{i)[d — 2] • • • v{i)[k]{B — 1) ■ ■ ■ {B — 1) when written in base B, 

• yi = v{j) [^]v{j) [1] • • • v(j) [fc + 1]0 • • • when written in base B, and 

• 1/2 = v{i)[{)\v{j)[l\ ■ ■ ■ v{j)[k + i\{B -1)---{B -I) when written in base B. 

The crucial observation is that for a source-sink pair, where the source is the s'th source and the sink is the 
t'th sink, the edges on the path from the source to the sink in the butterfly graph is precisely those edges 
Gkihj) for which the corresponding rectangle rk{i,j) contains the point (s, revB(t)), where revB(t) is the 
number obtained by writing t in base B and then reversing the digits. 

We now collect the set of rectangles R, containing each rectangle rk{i,j) corresponding to an edge 
of the butterfly graph. Given an input subgraph G, we mark all rectangles rj.{i,j) G R for which the 
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corresponding edge efc(i, j) is also in G. It follows that there is a directed path from the s'th source to the 
t'th sink in the subgraph G if and only if (s, rev^(t)) is not contained in any unmarked rectangle in R. 

Our goal is now to transform marked and unmarked rectangles to points, such that we can use a skyline 
counting data structure to determine whether a given point (s, revB(t)) is contained in an unmarked rectan- 
gle. Note that our reduction only works for the rectangle set R obtained from the butterfly graph, and not 
for any set of rectangles, i.e. we could not have reduced from the general problem of 2-d rectangle stabbing. 

2-d Rectangles to Points. To avoid tedious details, we from this point on allow the input to skyline queries 
to have multiple points with the same x- or y-coordinate (though not two points with both coordinates 
identical). This assumption can easily be removed, but it would only distract the reader from the main ideas 
of our reduction. We still use the definition that a point (x, y) dominates a point (x', y') if and only if x' < x 
and y' < y. 

The next step of the reduction is to map the rectangles i? to a set of points. For this, we first transform 
the coordinates slightly: For every rectangle rfc(i, j) G R, having coordinates [xi, X2] x [yi,y2], we modify 
each of the coordinates in the following way: 

• xi ■<— dxi + {d — 1 — k), 

• X2 ■<— dx2 + d — 1, 

• yi ■^ dyi + k, and 

• y2 ^ dy2 + d - 1. 

The multiplication with d essentially corresponds to expanding each point with integer coordinates to a 
d X d grid of points. The purpose of adding k to yi and {d — I — k) to xi is to ensure that, if two rectangles 
share a lower-left corner (only possible for two rectangles rk{i,j) and rf^i{i' ,j') where k 7^ k'), then those 
corners do not dominate each other in the transformed set of rectangles. We will see later that the particular 
placement of the points based on k also plays a key role. We use vr : [B'^]'^ — ;• [di?'']^ to denote the above 
map. With this notation, the transformed set of rectangles is denoted vr(i?) and each rectangle rk{i,j) G R 
is mapped to 7r(rfc(i, j)) G tt{R). 

We now create the set of points P' containing the set of lower-left corner points for all rectangles 
''^{fkihj)) £ vr(i?), i.e. for each vr(rfc(i, j)) = [xi,X2] x [2/1,2/2]. we add the point (xi,yi) to P'. See 
Figure[6]for an example. The set P' has the following crucial property: 

Lemma 4 Let (x,y) be a point with coordinates in [B'^] x [B"^]. Then for the two-sided query rectan- 
gle Q = (—00, dx + d — 1] X (—00, dy + d — 1], it holds that Skyline((5 H P') contains precisely the 
points in P' corresponding to the lower-left corners of the rectangles 7r(rfc(i, j)) G 7r{R)for which rk{i^j) 
contains (x, y). 

Proof. First let p = (xi,yi) G P' be the lower-left corner of a rectangle 7:{rk{i,j)) such that rk{i,j) 
contains the point (x, y). We want to show that p G Skyline((5 n P'). Since rk{i,j) contains the point 
(x, y), we have x > [xi/dj and y > [yi/d\ . From this, we get dx -t d — 1 > d[xi/d\ -\- {d — 1 — k) = xi 
and dy -\- d — 1 > d[yi/d\ + k = yi, i.e. p is inside Q. Since (x,y) is inside rk{i,j), we also have that 
(dx + d — 1, dy + d — 1) is dominated by the upper-right corner of iT{rk{i, j)), i.e. (dx + d — 1, dy + d — 1) 

is inside 7r(rfc(i,j))- 

What remains to be shown is that no other point inQoP' dominates p. For this, assume for contradiction 
that some point p' = (x'^ ,y[) G P' is both in Q and also dominates p. First, since p' is dominated by 
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Figure 6: The butterfly with degree B = 2 and depth d = 3 from Figure |5] translated to a set of rectangles. 
The dashed rectangles correspond to the edges a, b and c from Figure |5] Every grid point is replaced by up 
to d points placed on a diagonal. Each rectangle obtained from an edge of the butterfly graph produces one 
point on the diagonal corresponding to the rectangle's lower left corner. The points corresponding to the 
rectangles obtained from edges a, h and c are shown in gray. The query corresponding to the source s = 001 
and the sink t = 110 in Figure [5] is translated to the two-sided skyline query rectangle with its upper 
right comer at the x . The double circled points are the points on the skyline of the query range and these 
correspond exactly to the lower left corners of the rectangles containing the x . 

{dx + d — \,dy + d — 1) and also dominates p, we know that p' must be inside 7r{rk{i,j)). Now let 
7r(r/c'(«', j')) 7^ '^{'^kihj)) be the rectangle in vr(i?) from which p' was generated, i.e. p' is the lower-left 
corner of 7r(rfc/ (i', j')). We have three cases: 

1. First, if k' = k we immediately get a contradiction since the rectangles iT{R)k = {'^{rk'i'i' ,j')) S 
tt{R) \ k' = k} are pairwise disjoint and hence p' could not have been inside 7r(rfc(i, j)). 



2. If k' < k, we know that 7r(rfc/ {i',j')) is shorter in x-direction and longer in y-direction than 7r(rfc {i,j)). 
From our transformation, we know that (yi mod d) = k and {y'^ mod d) = k' < k. Thus since p' 
dominates p, we must have [y'i/d\ > [yi/d\ . But these two values are precisely the y-coordinates of 
the lower-left comers of rfc(z,j) and rfc/(i', j'). By definition, we get: 



v{j'mv{j')[l]---v{j')[k' + l]0---0>v{j)[0]v{j)[l]---v{j)[k + l]0---0. 
Since k' < k, this furthermore gives us 

v{fmv{f)[i] ■ ■ ■ vu')w + 1] > ^(j)[o]^(j)[i] • • • v{j)w + 1] • 
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Figure 7: Points corresponding to unmarked rectangles are replaced by two points. The example from 
Figure [5] and Figure [6] has three unmarked rectangles, corresponding to edges a, 6 and c of the butterfly 
graph. As shown, these rectangles become two (gray) input points and marked rectangles are represented 
by only one input point. The upper right corner of the two-sided query rectangle corresponding to the 
source s = 001 and sink t = 110 in the previous examples is shown as a x. The double circled points are 
the points on the skyline of the query range. As can be seen, the unmarked rectangle corresponding to the 
edge labelled b contributes two points to the skyline of the query x . 

From this it follows that 

^(/)[0]^(/)[l] • ■■v{j')[k' + 1]0- • -O v{j)[0]v{j)[l] ■ ■■v{j)[k + l]{B-l)---{B-l), 

i.e. the lower-left corner of r^/ {i', j') is outside rk{i,j), which also implies that the lower-left corner 
of 7r(rfc/ {i' ,j')) is outside 7r{rk{i, j)). That is, p' is outside 7r{rk{i, j)), which gives the contradiction. 

3. The case for k' > k is symmetric to the case k' < k, just using the x-coordinates instead of the 
y-coordinates to derive the contradiction. 

The last step of the proof is to show that no point p = (xi, yi) G P' can be in Skyline((5 n P') but at the 
same time correspond to the lower-left comer of a rectangle 7r(rfc(i, j)) where rj^{i^j) does not contains the 
point (x, y). First observe that {dx+d—1, dy+d—1) is contained in precisely one rectangle vr(rfc/(i', j')) for 
each value of /c' € [d]. Nowlet 7r(rfc(i',j')) ^ 7r(rfc(i,j)) be the rectangle containing (dx+d—ljciy+d—l) 
amongst the rectangles vr (i?) ^ . The lower-left comer of this rectangle is dominated hy {dx+d—1, dy+d—\) 
but also dominates p, hence p is not in Skyline((5 H P'). D 
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Handling Marked and Unmarked Rectangles. The above steps are all independent of the concrete input 
subgraph G. As discussed, we need a way to determine whether a query point is contained in an unmarked 
rectangle or not. This step is now very simple in light of Lemma [4} First, multiply all coordinates of points 
in P' by 2. This corresponds to expanding each point with integer coordinates into a 2 x 2 grid. Now for 
every point p G P' , if the rectangle 'iT{rk{i,j)) from which p was generated is marked, then we add 1 to 
both the X- and y-coordinate of p, i.e. we move p to the upper-right corner of the 2x2 grid in which it is 
placed. If iT{rk{i, j)) is unmarked, we replace it by two points, one where we add 1 to the x-coordinate, and 
one where we add 1 to the y-coordinate, see Figurelv] We denote the resulting set of points P{G). It follows 
immediately that: 

Corollary 1 Let G be a subgraph of the butterfly graph with degree B and depth d. Also, let (x, y) be a 
point with coordinates in [B \ x [B ]. Then for the two-sided query rectangle Q = (— oo, 2d{x + 1) — 1] x 
(— oo, 2d{y + 1) — 1], it holds that Skyline((5 H P{G)) contains precisely one point from P{G) for every 
marked rectangle in R that contains (x, y), two points from P{G) for every unmarked rectangle in R that 
contains (x, y), and no other points. 

Corollary 2 Let G be a subgraph of the butterfly graph with degree B and depth d. Also, let (x, y) be a 
point with coordinates in [B \ x [B \. Then for the two-sided query rectangle Q = (— oo, 2d{x + 1) — 1] x 
(— oo, 2d{y + 1) — 1], it holds that |Skyline((5 H P{G))\ — d equals the number of unmarked rectangles in 
R which contains (x, y). 

Corollary 3 Let G be a subgraph of the butterfly graph with degree B and depth d. Let s be the index of a 
source and t the index of a sink. Then the s 'th source can reach the t 'th sink in G if and only if \ Skyline((5 H 
P{G))\ = d for the two-sided query rectangle Q = {—oo,2d{s -\- 1) — 1] x {—oo,2d{revB{t) + l) — l]. 

Deriving the Lower Bound. The lower bound can be derived from Corollary [3] and Theorem [T] as fol- 
lows. First note that the set R contains NB rectangles, since each rectangle corresponds to an edge of the 
buttefly graph and each of the N non-sink nodes of the butterfly graph has B outgoing edges. Each of these 
rectangles gives one or two points in P{G). Letting n denote \P{G)\, we have NB < n < 2NB. From 
N = d ■ B'^ < n we get d < Ign and d = @{lgB N). 

Given n, w > Ign, and 5 > n, we now derive a lower bound on the query time. Setting B = -w"^ 
we have B = Q.{uP') and \gB = i7(lg ^) (as required by Theorem 1|, where the last bound follows from 
Igf ^ Igf/f^ ^ lg(2S^) < lg(2S2) = 0{\gB). Furthermore we have Ig ^ = \\g{^f > 
i Ig(^u)^) = \\gB. From Theorem 1 we can now bound the time for a skyline counting query by t = 
0((i) = Vt{\gB N) = 0(lg n/ Ig B) = f)(lg n/ \g{Sw/n)). 

4 Skyline Range Reporting 

In this section, we show how to extend our skyline range counting data structure from Section |2] to also 
support reporting. Given a query rectangle R = [xi, X2] x [yi,y2], we let t"!, ... , Vm and Ii, . . . , /„ be 



defined as in Section 2.1 The goal is to report the skyline for each of the subqueries Ri = Ig x [zi, y2\ 



where zi = yi and zi = max{z£_i, y™^^ + 1} fori = 2..m and yf^^^ is the maximal y-coordinate of a 



point in /i x [1, ^2]- Using the approach from Section 2.1 we assume the zi's have been computed as well 
as the index of the successor of yi and the index of the predecessor of zg in each of the L„^ lists. Recall the 
lists Ly are not stored explicitly. 
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To answer the query iZ^ at a node v = V£, let [i,j] be the range of children of v that are spanned by Ri 
in the horizontal direction and let ^bottom be the index of the successor of Z£ in L^ and ytop the index of 
the predecessor of y2 in L^. We first produce an output list Yi storing each point of Skyline(i?^ n P^) as 
an index into L^. The key observation for producing this list is that the skyline inside a query rectangle is 
the set of points produced by the following procedure: First report the rightmost point in the query range 
and then recurse on the query rectangle obtained by moving the bottom side of the query to just above the 
returned point. 

We implement this strategy in the following. First, if R^ is completely within one block (block [ytop/ A^] 
of L^), we answer it by first running 

Rightmost((Tt,([ytop/A^]), 1 + (^bottom - 1 mod A^), 1 + (ytop - 1 mod A^),i,j) . 

Adding A^ [ytop/ A^ — 1] to the returned value gives the index k into L„ of the rightmost point in the output. 
We add A; to Y^ and recurse on the query rectangle with y-range from fc + 1 to ytop- 

If the query range is not contained in one block, we use the decomposition into five queries that was 
introduced in Section 2.1 see Figure [4] We define pi, . . . ,p4 and (1), . . . , (5) as in Section 2.1 The 



subquery (1) is answered as just described for the case of a query range completely within a block. The 
query (4) is answered using first the query Rightmost fc^ (Succ„(p2-y + 1, ki),PTedy{pi.y, ki)). Following 
that, we move the bottom of the query rectangle just above the returned point and recurse. 

The query range (2) is answered by first repeatedly using the Rightmost^ ^j^.,.]^, operation to identify 
the blocks within the query range (2) containing points from Skyline(i?£nP^). Let qi, . . . ,qtbe the indexes 
into Ly of the rightmost points returned in each of these blocks, computable by 

qi = Rightmost^ fc^+i J ([ybottom/A^l + 1, [ytop/A^] - 1) , 

and for r > 1 (until no further point is found) by 

qr = Rightmosty^k^+i,ji\Qr-i/^^] + 1, [ytop/A^] - 1) . 

Within each block [g^/A^] we compute the additional points that should be reported within slabs [ki,j] 
from right-to-left, starting with Rightmost((T„([gr/A^] ), 2 + {qr — 1 mod A^), A^, ki,j), until no point 
is found or we find the first point / that should not be reported, i.e. / is dominated by qr+i, which can be 
checked by the condition 7 < C^,(gr+i) or 7 = Cy{qr+i) and q' = Rightmost„/(/', q'), where 7 = Cy{f), 
v' = c2 and /' = Pred^(/, 7), and q' = PTedy{qr+i,-f). 

The query (5) is answered by repeatedly using Rightmost k-^ and finally we answer (3) using 

Rightmost (o-„([ybottom/A^]), 1 + (^bottom - 1 mod A^), A^, ^3 + l,j) 

andrecursing above the returned point. It follows that the list Yc is produced in 0(l + |Skyline(-R£nPt,)|) = 
0(1 + |Skyline(i?^ f~^ P)\) time. Summing over all lists Y^, we get a total time of 0(lgn/lglgn + k). 

What remains is to map the indices in the lists Y^ to the actual coordinates of the corresponding points. 
Using the vr arrays, this can be done by repeatedly determining the position of Ly[i] in Lp^. Doing this 
for all 0(lg n/ Ig Ig n) levels of the tree allows one to deduce the global y-rank of the point corresponding 
to Ly[i]. Storing an additional 0{n) sized array mapping global y-ranks to the corresponding points gives 
a total running time of 0((1 + k) Ign/lglgn). To speed this up, we use the Ball-Inheritance structure 
of im. For completeness, we describe how this data structure is implemented in terms of the vr arrays we 
have defined: Let i? > 2 be a parameter. For every level j in the base tree T that is a multiple of B\ for 
i = 0, . . . , Igg Ig^ n, we let all nodes v at level j store the following array: 
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vr^ (j) For each j, 1 < j < n„, stores the index of Ly[j] in L^^^y Here u(v) is the ancestor of v at the 
nearest level that is a multiple of 2?*+^ (excluding possibly the level storing v). This can be supported 
by constructing the select data structure of Lemma [lion the bit- vector X, where X[j] = 1 if and 
only if L^f^\[j] is in L^. A query tt), (j) becomes select(j). The space usage for vri, becomes 

OKlgK(,)K)) = OKlg(A^'+')) = 0(n,S*+ilgA)bits. 

Given an index i into Ly, we can now recover L^ [i] by using the vr arrays to first jump B levels up, then 
B^ levels up and so forth. The number of jumps becomes 0(lgB IgA n) and hence we get a query time of 
0(lg n/lg\gn + k Igg Ig^ n). The total space usage for all vr arrays becomes 

(Igs IgA " , \ 

^ lAl^.B'+HgA\ =0{nlgn-{BlgslgAn)) 

bits. Setting B = Ig^ n for an arbitrarily small constant e > gives a data structure with query time 
0(lgn/lglgn + k) and space usage O(nlg^n) words. Setting B = 2 gives a data structure with query 
time 0(lg n/lglgn + k Ig Ig n) and space usage 0{n Ig Ig n) words as claimed. 
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